学习分配的尾巴行为是一个众所周知的困难问题。从定义上讲,尾部的样品数量很小,深层生成模型(例如归一化流量)倾向于集中于学习分布的身体。在本文中,我们专注于提高归一化流以正确捕获尾巴行为的能力,从而形成更准确的模型。我们证明,可以通过其基本分布的边缘的尾巴来控制自回归流的边际尾巴。这种理论上的见解使我们获得了一种基于灵活的基础分布和数据驱动线性层的新型流量。经验分析表明,所提出的方法提高了准确性(尤其是在分布的尾巴上),并能够生成重尾数据。我们证明了它在天气和气候示例中的应用,其中捕获尾巴行为至关重要。
translated by 谷歌翻译
在过去的几年里,已经表明,深度学习系统在对抗性示例的攻击中非常脆弱。基于神经网络的自动语音识别(ASR)系统也不例外。有针对性的和未确定的攻击可以以这样的方式修改音频输入信号,使得人类仍然识别相同的单词,而ASR系统被转向以预测不同的转录。在本文中,我们提出了一种防御机制,该防御机制通过在向ASR系统馈送输入之前,通过应用慢速特征分析,低通滤波器或两者来删除来自音频信号的快速变化功能。我们对在这种方式预处理的数据训练的混合ASR模型进行了实证分析。虽然所产生的模型在良性数据上表现得非常好,但它们对针对性的对抗攻击进行了更高的稳健性:我们的最终建议的模型显示了与基线模型类似的清洁数据的性能,同时具有比较强大的四倍。
translated by 谷歌翻译
尽管对安全机器学习的重要性,但神经网络的不确定性量化远未解决。估计神经不确定性的最先进方法通常是混合的,将参数模型与显式或隐式(基于辍学的)合并结合。我们采取另一种途径,提出一种新颖的回归任务的不确定量化方法,纯粹是非参数的。从技术上讲,它通过基于辍学的子网分布来捕获梯级不确定性。这是通过一个新目标来实现的,这使得标签分布与模型分布之间的Wasserstein距离最小化。广泛的经验分析表明,在生产更准确和稳定的不确定度估计方面,Wasserstein丢失在香草测试数据以及在分类转移的情况下表现出最先进的方法。
translated by 谷歌翻译
最近公布的知识图形嵌入模型的实施,培训和评估的异质性已经公平和彻底的比较困难。为了评估先前公布的结果的再现性,我们在Pykeen软件包中重新实施和评估了21个交互模型。在这里,我们概述了哪些结果可以通过其报告的超参数再现,这只能以备用的超参数再现,并且无法再现,并且可以提供洞察力,以及为什么会有这种情况。然后,我们在四个数据集上进行了大规模的基准测试,其中数千个实验和24,804 GPU的计算时间。我们展示了最佳实践,每个模型的最佳配置以及可以通过先前发布的最佳配置进行改进的洞察。我们的结果强调了模型架构,训练方法,丢失功能和逆关系显式建模的组合对于模型的性能来说至关重要,而不仅由模型架构决定。我们提供了证据表明,在仔细配置时,若干架构可以获得对最先进的结果。我们制定了所有代码,实验配置,结果和分析,导致我们在https://github.com/pykeen/pykeen和https://github.com/pykeen/benchmarking中获得的解释
translated by 谷歌翻译
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
translated by 谷歌翻译
Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks, while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL (short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (13-85% cost savings).
translated by 谷歌翻译
This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute5 (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases.
translated by 谷歌翻译
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
translated by 谷歌翻译
Current differentiable renderers provide light transport gradients with respect to arbitrary scene parameters. However, the mere existence of these gradients does not guarantee useful update steps in an optimization. Instead, inverse rendering might not converge due to inherent plateaus, i.e., regions of zero gradient, in the objective function. We propose to alleviate this by convolving the high-dimensional rendering function that maps scene parameters to images with an additional kernel that blurs the parameter space. We describe two Monte Carlo estimators to compute plateau-free gradients efficiently, i.e., with low variance, and show that these translate into net-gains in optimization error and runtime performance. Our approach is a straightforward extension to both black-box and differentiable renderers and enables optimization of problems with intricate light transport, such as caustics or global illumination, that existing differentiable renderers do not converge on.
translated by 谷歌翻译
Powerful hardware services and software libraries are vital tools for quickly and affordably designing, testing, and executing quantum algorithms. A robust large-scale study of how the performance of these platforms scales with the number of qubits is key to providing quantum solutions to challenging industry problems. Such an evaluation is difficult owing to the availability and price of physical quantum processing units. This work benchmarks the runtime and accuracy for a representative sample of specialized high-performance simulated and physical quantum processing units. Results show the QMware cloud computing service can reduce the runtime for executing a quantum circuit by up to 78% compared to the next fastest option for algorithms with fewer than 27 qubits. The AWS SV1 simulator offers a runtime advantage for larger circuits, up to the maximum 34 qubits available with SV1. Beyond this limit, QMware provides the ability to execute circuits as large as 40 qubits. Physical quantum devices, such as Rigetti's Aspen-M2, can provide an exponential runtime advantage for circuits with more than 30. However, the high financial cost of physical quantum processing units presents a serious barrier to practical use. Moreover, of the four quantum devices tested, only IonQ's Harmony achieves high fidelity with more than four qubits. This study paves the way to understanding the optimal combination of available software and hardware for executing practical quantum algorithms.
translated by 谷歌翻译